32 research outputs found

    Modeling power consumption of 3D MPDATA and the CG method on ARM and Intel multicore architectures

    Get PDF
    We propose an approach to estimate the power consumption of algorithms, as a function of the frequency and number of cores, using only a very reduced set of real power measures. In addition, we also provide the formulation of a method to select the voltage–frequency scaling–concurrency throttling configurations that should be tested in order to obtain accurate estimations of the power dissipation. The power models and selection methodology are verified using two real scientific application: the stencil-based 3D MPDATA algorithm and the conjugate gradient (CG) method for sparse linear systems. MPDATA is a crucial component of the EULAG model, which is widely used in weather forecast simulations. The CG algorithm is the keystone for iterative solution of sparse symmetric positive definite linear systems via Krylov subspace methods. The reliability of the method is confirmed for a variety of ARM and Intel architectures, where the estimated results correspond to the real measured values with the average error being slightly below 5% in all cases

    Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015) Krakow, Poland

    Get PDF
    Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015

    Exploring OpenMP Accelerator Model in a real-life scientific application using hybrid CPU-MIC platforms

    Get PDF
    Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016). Sofia (Bulgaria), October, 6-7, 2016.The main goal of this paper is the suitability assessment of the OpenMP Accelerator Model (OMPAM) for porting a real-life scientific application to heterogeneous platforms containing a single Intel Xeon Phi coprocessor. This OpenMP extension is supported from version 4.0 of the standard, offering an unified directive-based programming model dedicated for massively parallel accelerators. In our study, we focus on applying the OMPAM extension together with the OpenMP tasks for a parallel application which implements the numerical model of alloy solidification. To map the application efficiently on target hybrid platforms using such constructs as omp target, omp target data and omp target update, we propose a decomposition of main tasks belonging to the computational core of the studied application. In consequence, the coprocessor is used to execute the major parallel workloads, while CPUs are responsible for executing a part of the application that do not require massively parallel resources. Effective overlapping computations with data transfers is another goal achieved in this way. The proposed approach allows us to execute the whole application 3.5 times faster than the original parallel version running on two CPUs.This research was conducted with the support of COST Action IC1305 (NESUS), as well as the National Science Centre (Poland) under grant no. UMO-2011/03/B/ST6/03500. The authors are grateful to the Czestochowa University of Technology for granting access to Intel Xeon Phi coprocessors provided by the MICLAB project no. POIG.02.03.00.24-093/13 (http://miclab.pl)

    An experience report on (auto-)tuning of mesh-based PDE solvers on shared memory systems.

    Get PDF
    With the advent of manycore systems, shared memory parallelisation has gained importance in high performance computing. Once a code is decomposed into tasks or parallel regions, it becomes crucial to identify reasonable grain sizes, i.e. minimum problem sizes per task that make the algorithm expose a high concurrency at low overhead. Many papers do not detail what reasonable task sizes are, and consider their findings craftsmanship not worth discussion. We have implemented an autotuning algorithm, a machine learning approach, for a project developing a hyperbolic equation system solver. Autotuning here is important as the grid and task workload are multifaceted and change frequently during runtime. In this paper, we summarise our lessons learned. We infer tweaks and idioms for general autotuning algorithms and we clarify that such a approach does not free users completely from grain size awareness

    Fast DEM collision checks on multicore nodes.

    Get PDF
    Many particle simulations today rely on spherical or analytical particle shape descriptions. They find non-spherical, triangulated particle models computationally infeasible due to expensive collision detections. We propose a hybrid collision detection algorithm based upon an iterative solve of a minimisation problem that automatically falls back to a brute-force comparison-based algorithm variant if the problem is ill-posed. Such a hybrid can exploit the vector facilities of modern chips and it is well-prepared for the arising manycore era. Our approach pushes the boundary where non-analytical particle shapes and the aligning of more accurate first principle physics become manageable

    Idźmy na Kupałę! [dokument dźwiękowy]

    No full text
    20min, format mp3, 128 kbit/s

    V Ogólnopolski Konkurs Poezji Zaangażowanej o nagrodę "Czerwonej róży" [dokument dźwiękowy]

    No full text
    15 min, format mp3, 128 kbit/s

    Kultura studencka w roku 1965 [dokument dźwiękowy]

    No full text
    28 min, format mp3, 128 kbit/s
    corecore